Picture for Dongcheng Zhao

Dongcheng Zhao

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Add code
May 12, 2026
Viaarxiv icon

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation

Add code
May 12, 2026
Viaarxiv icon

Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron

Add code
Feb 02, 2026
Viaarxiv icon

TEFormer: Structured Bidirectional Temporal Enhancement Modeling in Spiking Transformers

Add code
Jan 26, 2026
Viaarxiv icon

Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural Networks

Add code
Dec 27, 2025
Viaarxiv icon

Efficient LLM Safety Evaluation through Multi-Agent Debate

Add code
Nov 09, 2025
Viaarxiv icon

MVPBench: A Benchmark and Fine-Tuning Framework for Aligning Large Language Models with Diverse Human Values

Add code
Sep 09, 2025
Viaarxiv icon

PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks

Add code
May 22, 2025
Figure 1 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Figure 2 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Figure 3 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Figure 4 for PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Viaarxiv icon

STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

Add code
May 16, 2025
Viaarxiv icon

Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence

Add code
May 15, 2025
Viaarxiv icon